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(57) Abstract 

The architecture of numerous networks, including the Internet with its World Wide Web (WWW) browsers and servers, support full 
file transfer for document retrieval. In order for the WWW to support continuous media, it is necessary to transmit video and audio on 
demand and in real-time, as well as new protocols for real-time data. The invention extends the architecture of the WWW to encompass 
the dynamic real-time information space of video and audio. The inventive method, called Vosaic, short for Video Mosaic, incorporates 
real-time video and audio into standard hypertext pages and which are displayed in place. Video and audio transfers occur in real-time; 
there is no file retrieval latency. The video and audio result in compelling Web pages. Real-time video and audio data can be effectively 
served over the present day Internet with the proper transmission protocol. The invention includes a real-time protocol, called a video 
datagram protocol (VDP), for handling real-time video over the WWW. VDP minimizes inter-frame jitter and dynamically adapts to the 
client CPU load and network congestion. The video server in accordance with the invention dynamically changes transfer protocols, adapting 
to the request stream. The invention also is applicable to other networks using Internet-type protocols such as TCP/IP, including local area 
networks, metropolitan area networks, and wide area networks. 
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WO 97/22201 PCT7US96/I9226 

METHOD OF AND SYSTEM FOR TRANSMITTING AND/OR RETRIEVING 
REAL-TIME VIDEO AND AUDIO INFORMATION 
OVER PERFORMANCE-LIMITED TRANSMISSION SYSTEMS 

FIELD OF THE INVENTION 

5 The present invention relates to a method of and system for transmitting 

and/or retrieving real-time video and audio information. The inventive method 
compensates for congested conditions and other performance limitations in a 
transmission system over which the video information is being transmitted. More 
particularly, the invention relates to a method- of transmitting and/or retrieving real- 
10 time video and audio information over the Internet, specifically the World Wide Web 

BACKGROUND OF THE INVENTION 

"Surfing the Web" has entered the common vocabulary relatively recently. 
Individuals and businesses have come to use the Internet both for electronic mail (e- 
mail) and for access to information, commonly over the World Wide Web (WWW. or 
15 the Web). As modem speeds have increased, so has Web traffic. 

Web browsers, such as National Computer Security Association (NCSA) 
Mosaic, allow users to access and retrieve documents on the Internet. These 
documents most often are written in a language called HyperText Markup Language 
(HTML). Traditional information systems design for World Wide Web clients and 
20 servers has concentrated on document retrieval and the structuring of document- 
based information, for example, through hierarchical menu systems as are used in 
Gopher, or links in hypertext as in HTML. 

Current information systems architecture on the Web has been driven by the 
static nature of document-based information. This architecture is reflected in the use 
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of the file transfer mode of document retrieval and the use of stream-based 
protocols, such as TCP. However, full file transfer and TCP are unsuitable for 
continuous media, such as video and audio, for reasons which will be discussed in 
greater detail below. 

5 The easy-to-use, point-and-ciick user interfaces of WWW browsers, first 

popularized by Mosaic, have been the key to the widespread adoption of HTML and 
the World Wide Web by the entire Internet community. Although traditional WWW 
browsers perform commendably in the static information spaces of HTML ' 
documents, they are ill-suited for handling continuous media, such as real time audio 
10 and video. 

Earlier Web browsers, such as Mosaic, required a user to wait until a 
document had been retrieved completely before displaying the document on the 
screen. Even at the faster transfer speeds which have been become possible in 
recent years, the delay between retrieval request and display has been frustrating * 

15 for many users. Particularly in view of the astronomical increase in Internet traffic, 
during especially busy times, congestion over the Internet has negated at least 
some of the speed advantages users have obtained by getting faster modems. 

Video and audio files tend to be much larger than document files in many 
instances. As a result, the delay involved in waiting for an entire file to download 

20 before it is displayed is even greater for video and audio files than for document 
files. Again, during busy times. Internet congestion would make the delays 
intolerable. Even in networks which are separate from the Internet, transmission of 
sizable video and audio files can result in long waits for file transfer prior to display. 



2 



WO 97/22201 



PCT/US96/19226 



Multimedia browsers such as Mosaic have been excellent vehicles for 
browsing information spaces on the Internet that are made up of static data sets. 
Proof of this is seen in the phenomenal growth of the Web. However, attempts at 
the inclusion of video and audio in the current generation of multimedia browsers 
have been limited to transfer of pre-recorded and canned sequences that are 
retrieved as full files. While the file transfer paradigm is adequate in the arena of 
traditional information retrieval and navigation, it becomes cumbersome for real time 
data. The transfer times for video and audio files can be very large. Video and 
audio files now on the Web take minutes to hours to retrieve, thus severely limiting 
the inclusion of video and audio in current Web pages, because the latency required 
before playback begins can be unacceptably long. The file transfer method of 
browsing also assumes a fairly static and unchanging data set for which a single uni- 
directional transfer is adequate for browsing some piece of information. Real time 
sessions such as videoconferences. on the other hand, are not static. Sessions 
happen in real time and come and go over the course of minutes to days. 

The Hypertext Transfer Protocol (HTTP) is the transfer protocol used between 
Web clients and servers for hypertext document service. The HTTP uses TCP as 
the primary protocol for reliable document transfer. TCP is unsuitable for real time 
audio and video for several reasons. 

First, TCP imposes its own flow control and windowing schemes on the data 1 
stream. These mechanisms effectively destroy the temporal relations shared 
between video frames and audio packets. 

Second, unlike static documents and text files, in which data loss can result in 
irretrievable corruption of the files, reliable message delivery is not required for video 
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and audio. Video and audio streams can tolerate frame losses. Losses are seldom 
fatal, although of course they can be detrimental to picture and sound quality. TCP 
retransmission, a technique which facilitates reliable document and text transfer, 
causes further jitter and skew internally between frames and externally between 
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